Firmament: Fast, Centralized Cluster Scheduling at Scale

نویسندگان

Ionel Gog

Malte Schwarzkopf

Adam Gleave

Robert N. M. Watson

Steven Hand

چکیده

Centralized datacenter schedulers can make high-quality placement decisions when scheduling tasks in a cluster. Today, however, high-quality placements come at the cost of high latency at scale, which degrades response time for interactive tasks and reduces cluster utilization. This paper describes Firmament, a centralized scheduler that scales to over ten thousand machines at subsecond placement latency even though it continuously reschedules all tasks via a min-cost max-flow (MCMF) optimization. Firmament achieves low latency by using multiple MCMF algorithms, by solving the problem incrementally, and via problem-specific optimizations. Experiments with a Google workload trace from a 12,500-machine cluster show that Firmament improves placement latency by 20× over Quincy [22], a prior centralized scheduler using the same MCMF optimization. Moreover, even though Firmament is centralized, it matches the placement latency of distributed schedulers for workloads of short tasks. Finally, Firmament exceeds the placement quality of four widely-used centralized and distributed schedulers on a real-world cluster, and hence improves batch task response time by 6×.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Load Balancing Through Coordinated Scheduling in Packet Data Systems

Third generation code-division multiple access (CDMA) systems propose to provide packet data service through a high speed shared channel with intelligent and fast scheduling at the base-stations. In the current approach base-stations schedule independently of other base-stations. We consider scheduling schemes in which scheduling decisions are made jointly for a cluster of cells thereby enhanci...

متن کامل

Partitioned Parallel Job Scheduling for Extreme Scale Computing

Recent success in building petascale computing systems poses new challenges in job scheduling design to support cluster sizes that can execute up to two million concurrent tasks. We show that for these extreme scale clusters the resource demand at a centralized scheduler can exceed the capacity or limit the ability of the scheduler to perform well. This paper introduces partitioned scheduling, ...

متن کامل

Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters

Datacenter-scale computing for analytics workloads is increasingly common. High operational costs force heterogeneous applications to share cluster resources for achieving economy of scale. Scheduling such large and diverse workloads is inherently hard, and existing approaches tackle this in two alternative ways: 1) centralized solutions offer strict, secure enforcement of scheduling invariants...

متن کامل

Failure Prediction and Scalable Checkpointing for Reliable Large-Scale Grid Computing

Computational clusters, the grids that federate them, and the applications that utilize their significant computing potential, all continue to grow with advances in hardware technology, cluster management, and grid middleware solutions. As they do, the likelihood that large-scale long-running grid and cluster applications will have to deal with underlying node unavailability and cluster failure...

متن کامل

Centrally Controlled Clustered Wireless Sensor Networks

We present IMPERIA, a centrally managed architecture for large-scale wireless sensor networks (WSN). Within the WSN, sensor nodes communicate using a clustered multihop TDMA protocol, which globally synchronizes the network and collects data at ultra-low power consumption. The novel contributions to the state-of-the-art include a) an efficient algorithm for network topology discovery and link q...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Firmament: Fast, Centralized Cluster Scheduling at Scale

نویسندگان

چکیده

منابع مشابه

Dynamic Load Balancing Through Coordinated Scheduling in Packet Data Systems

Partitioned Parallel Job Scheduling for Extreme Scale Computing

Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters

Failure Prediction and Scalable Checkpointing for Reliable Large-Scale Grid Computing

Centrally Controlled Clustered Wireless Sensor Networks

عنوان ژورنال:

اشتراک گذاری